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NONLINEAR MODELING OF GENE NETWORKS FROM 
TIME SERIES GENE EXPRESSION DATA 



Related Application: 

This application claims priority under 35 U.S.C. § 1 1 9(e) to United States Provisional 
Application Serial No: 60/427,448 filed November 19, 2002. This application is herein 
incorporated fully by reference. 

Field of the Invention 

This invention relates to the use ofBayesian models with nonparametric regression to infer 
network relationships between genes from time series studies of gene expression. In particular, the 
invention relates to methods involving minimizing a criterion, BNRQ^^ to infer optimal network 
relationships. 

BACKGROUND 

One of the most important aspects of current research and development in the life sciences, 
medicine, drug discovery and development and pharmaceutical industries is the need to develop 
methods and devices for interpreting large amounts of raw data and drawing conclusions based on 
such data. Bioinformatics has contributed substantially to the understanding of systems biology and 
promises to produce even greater understanding of the complex relationships between components 
of living systems. In particular, with the advent of new methods for rapidly detecting expressed 
genes and for quantifying expression of genes, bioinformatics can be used to predict potential 
therapeutic targets even without knowing with certainty, the exact roles a particular gene(s) may 
play in the biology of an organism. 

Simulation of genetic systems is a central topic of systems biology. Because simulations 
can be based on biological knowledge, a network estimation method can support biological 
simulation by predicting or inferring previously unknown relationships. 
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In particular, development of microarray technology has permitted studies of expression 
of a large number of genes from a variety of organisms. A very large amount of raw data can be 
obtained from a number of genes from an organism, and gene expression can be studied by 
intervention by either mutation, disease or drugs. Finding that a particular gene ' s expression is 
increased in a particular disease or in response to a particular intervention may lead one to believe 
that that gene is directly involved in the disease process or drug response. However, in biological 
organisms genes rarely are independently regulated by any such intervention, in that many genes 
can be affected by aparticular intervention. Because a large number of different genes maybe so 
affected, understanding the cause and effect relationships between genes in such studies is very 
difficult. Thus, much effort is being expended to develop methods for determining cause and effect 
relationships between genes, which genes are central to a biological phenomenon, and which genes' 
expression(s) are peripheral to the biological process under study. Although such peripheral gene's 
expression maybe useful as a marker of a biological or pathophysiological condition, if such a gene 
is not central to physiological or pathophysiological conditions, developing drugs based on such 
genes may not be worth the efforts. In contrast, for genes identified to be central to a process, 
development of drugs or other interventions may be crucial to developing treatments for conditions 
associated with altered expression of genes. 

Development ofBayesian network analysis for estimating a gene network from microarray 
gene expression data has received considerable attention and many successful investigations have 
been reported (Friedman et al [13]; Imoto et al [14]; Pe'er et al. [18] and our own work [U.S. 
Patent Application Serial No: 10/259,723 herein incorporated fully by reference]. 

However, a shortcoming of traditional Bayesian network models is that they cannot 
construct cyclic networks, while certain real gene regulatory mechanisms have cyclic components. 
Recently, a dynamic Bayesian network model (Bilmes et al. [3] ; Friedman et al [ 1 2] ; Someren 
et al [ 1 9] has been propsed for constructing a gene network with cyclic regulatory components. 
Dynamic Bayesian network is based on time series data, and usually the data can be discritized into 
several classes. Thus, a dynamic network model can depend on the setting of the thresholds for 
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the discritizing process, and unfortunately, the discritization can lead to loss of information. Imoto 
et al. [14, 15] proposed a network estimation method based on a Bayesian network and 
nonparametric regression for a solution to avoid discretization and for capturing non-linear relations 
among genes. However, Bayesian networks and nonparametric regression models [14, 15] still 
may not adequately solve networks having cyclic regulatory components. 

SUMMARY 

In certain embodiments, this invention includes the use of time-series expression data in a 
Bayesian network model with nonparametric regression. Using time series expression data, we 
can identify cyclic regulatory components. In other embodiments, time delay information can be 
incorporated into a Bayesian/nonparametric regression model, which then can extract even 
nonlinear relations among genes. In certain of these embodiments, an ordinal differential equation 
model can be used as an alternative. We also have developed new criteria for choosing an optimal 
network from a Bayesian statistical point of view. Such criteria can optimize a network structure 
based on data having noise. 

BRIEF DESCRIPTION OF THE FIGURES 

This invention is described with reference to specific embodiments thereof. Additional 
aspects of the invention are found the Examples and in the Figures, in which: 

Figure 1 depicts a schematic illustration of time dynamics in gene expression. 

Figures 2a and 2b depict diagrams of network relationships of genes involved in cell cycle 
regulation in yeast, compiled in KEGG. 

Figure 2a depicts genes in cyclin-dependent protein kinase pathways. 

Figure 2b depicts network relationships between genes described in Figure 2a involved in 
regulating cyclin-dependent protein kinases. 

Figures 3a - 3c depict diagrams of network relationships of yeast genes involved in 
metabolic pathways. 
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Figure 3a depicts several genes involved in metabolic pathways. 

Figure 3b depicts network relationships between genes described in Figure 3a derived 
from a Bayesian/nonparametric regression model. 

Figure 3c depicts network relationships between genes described in Figure 3a derived from 
a dynamic Bayesian/nonparametric regression model. 

DETAILED DESCRIPTION 
In general, a dynamic Bayesian network model can be obtained using any suitable method 
for determining gene expression. In certain embodiments, microarray experiments are desirable 
because a large number of genes can be studied from a single sample applied to the array, making 
relative differences in gene expression easy to determine. It maybe desirable to improve accuracy 
of microarray methods by subtracting background signals from the signal reflecting true gene 
expression and/or correcting for inherent differences in labels used to measure gene expression 
(e.g., cy3/cy5) 

Using a Bayesian network framework, we consider a gene as a random variable and 
decompose the joint probability into the product of conditional probabilities. For example, if we 
have a series of observations of the random vector, we can denote the probability of obtaining a 
given observation can depend upon the conditional probability densities. In certain embodiments, 
one can use nonparametric regression models for capturing the relationships between the variables. 
A variety of graphic tools can be used to elucidate the relationships. For example, polynomials, 
Fourier series, regression spline gases, B-spline bases, wavelet bases and the like can be used for 
defining a graph of gene relationships. Certain methods to elucidate network relationships are 
disclosed in U.S. Patent Application Serial No: 1 0/259,723, herein incorporated fully by reference. 
One difficulty in selecting a proper graph is to properly evaluate variance and noise in the system. 

In some embodiments of this invention, networks can be constructed using Bayesian 
estimation with nonparametric regression using data from time series studies. In many gene 
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networks, an intervention leads to alteration in expression of certain genes before alterations in 
other genes are observed. One may infer that expression of certain genes after an intervention that 
occur later in time, may be causally related to genes whose expression is early. Time series 
information is useful to define "early" or genes and "late" or gene. It is unlikely that an alteration 
in expression of a late gene could be a cause of an alteration of expression of an early gene, whose 
expression is altered sooner in time than that of a late gene. Although this presumption may not 
apply in all cases, it is more probable that early genes are more likely ' "upstream" in a network than 
are late genes, which are more likely to be "downstream" genes. Therefore, time relations of gene 
expression can be useful to modify Bayesian estimation and nonparametric regression to provide 
a more reliable network solution. 

In aspects of this invention, we extend the Bayesian network and nonparametric regression 
model to a dynamic Bayesian network model, which can be used to construct cyclic relationships 
when one has time series gene expression data. Information on time delay between changes in 
gene expression can be included in a model easily, and the model can extract even nonlinear 
relations among genes easily. 

In certain embodiments, for constructing a gene network with cyclic regulatory 
components, an ordinal differential equation model (Chen et al. [5]; de Hoon et al. [8] can be used. 
However, this model is based on a linear system and maybe unsuitable for capturing complex 
phenomena. We have derived a new criterion for choosing an optimal network from Bayesian 
statistical point of view [2] . The criterion can optimize network structure, which gives the best 
representation of the gene interactions described by the data with noise. The new criterion is herein 
termed BNRC^ wom/c . 

BNRC^ flm/c can be evaluated using a first-order Markov relation as illustrated in Figure 
1 . In such a relationship, an upstream gene ^ is depicted as having an effect (right arrow) on one 
or more downstream genes X 2 , which has an effect on X 3 (not shown), and so on, until an effect 
on X n is observed. In situations in which X { has no "upstream" gene of its own,X y is termed a 
"parent" gene within the network. Genes under influence of a parent gene are termed "target" 
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genes. Note that the use of "target gene" in this context is not to be confused with a gene that is 
a target for intervention, such as by a potential drug. In fact, parent genes may be targets for 
therapeutic intervention. Under this scheme, an effect on^ cannot be observed until effects on 
X h X 2 , etc. have been elicited. Note that Figure 1 illustrates a "series" cause/effect relationship, 
without parallel or feedback systems are present, whereas in many genetic systems, there are series 
effects, and "parallel" effects, in which two or more genes can either be affected by an upstream 
gene, and/or can themselves affect a downstream gene. Moreover, circular effects ("feedback") 
can be present, in which a gene X a can affect another geneX 6 , which can affect X c , which itself can 
affect^ (or X b ). Moreover, such feedback maybe either positive, in which X c stimulates^ or 
"negative" in which X c inhibits^. Further complexities can arise in situations in which both series, 
parallel, positive feedback and negative feedback relationships are present. 

In general, relationships between time points may be arbitrary, but in some cases it can be 
advantageous to use pre-selected time points based on knowledge of the biological effects of the 
genes and their expression dynamics under study. Under first order conditions, ajoint probability 
can be decomposed as shown in equation (1) in Example 1 below. A conditional probability can 
then be decomposed into the product of conditional probabilities using equation (2) in Example 1 . 
Equations (1) and (2) can hold and the density function can be used instead of a probability 
measure. Therefore, the dynamic Bayesian network can be represented, for example, using 
densities described in Example 1 to arrive at the local network structure of a gene and its parent 
genes according to equation (3) in Example 1. 

A dynamic Bayesian model with nonparametric regression can be applied, for example, 
as described in Example 2. Once experimental data is collected, a the solution to the network can 
be considered to be a statistical model selection problem. In certain embodiments, we can solve 
this problem using Bayesian approach and derive a criterion for evaluating the goodness of the 
dynamic Bayesian network and nonparametric regression methods. Assuming a prior distribution, 
marginal likelihood and posterior probability can be determined according to equation (4) in 
Example 2. Subsequent construction of a genetic network involves computation of a high 
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dimensional integral as depicted in equation (4). In some embodiments, a Laplace method for 
integrals, for example, can be used to approximate the integral. Therefore, the criterion 
BNRC^ flflOT/c as shown in equation (5) in Example 2 can be solved. 

To apply BNRQ^fc to an experimental system, cDNA microarray data, for example, 
can be obtained experimentally at a number of time points after affecting the genetic system. To 
smooth curves, we can use spline functions, for example 5-splines as depicted in Example 3 . 
BNRC^^fc can be decomposed according to equation (6) in Example 3. Optimal network 
relationships are obtained when BNRC dymimic is minimized. 

Using dynamic Bayesian network models with nonparametric regression and the criterion 
BNRC^nonfc, we can formulate a network learning process. However, determining which genes 
are parent genes and which are target genes can be time consuming when all possible gene 
combinations and relationships are considered. To reduce the number of analyses needed, we can 
select candidate parent genes. Subsequently a greedy hill-climbing algorithm can be used. 
BNRCjyMmic is calculated and then an addition parent gene is either added or deleted, and 
ENRC dynamic is re-evaluated according to Step 2 in Example 3 . The process is repeated until an 
appropriate convergence is found. Then, the order of computation is permutated and BNRQ^^ 
is reevaluated. The optimal network give the smallest BNRC rf>WflOTI - c . 

A specific illustration of the above methods are shown in Example 4 in Figures 2a and 2b. 
The efficiencies of the methods are shown through analysis of gene expression data from 
Saccharomyces cerevisiae. Figure 2a depicts a group of S. cerevisiae genes involved in 
regulation of cell cycle. The genes are depicted as grouped based in the overall metabolic 
pathways involved and focus on the cyclin-dependent protein kinase gene (YBR 1 60w). Note that 
the parent/target gene network relationships are unknown based on Figure 2a. In contrast, using 
methods of this invention, network relationships of those genes can be evaluated and are depicted 
in Figure 2b. 

Another example is depicted in Figures 3a - 3c. Figure 3a depicts genes involved in 
metabolic pathways. Figure 3a shows no gene network relationships. Figure 3b depicts a network 
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solution obtained using Bayesian network analysis with nonparametric regression, but without 
consideration of BNRC dynamic . Figure 3c depicts a network solution obtained by minimizing 
BNRQ^^. Note that in Figure 3b, the network relationships are simpler, and compared to those 
depicted in Figure 3b, there are many fewer false positive relationships ("x"). 

Boundaries between groups of genes in a network can be determined using methods 
known in the art, for example, bootstrap methods. Such methods include determining the intensity 
of an edge 

using the following steps. 

( 1 ) providing a bootstrap gene expression matrix by randomly sampling a number of 
times, with replacement, from the original gene library expression data; 

(2) estimating the genetic network for gene,- and gene,; 

(3) repeating steps (1) and (2) T times, thereby producing T genetic networks; and 

(4) calculating the bootstrap edge intensity between gene, and gene, as + t 2 )/T. 

Advantages of the new methods compared with other network estimation methods such 
as Bayesian and Boolean Networks include: ( 1 ) time information can be incorporated easily; (2) 
microarray data can be analyzed as continuous data without extra data pre-treatments such as 
discretization; and (3) fewer false positive relationships are found. Even nonlinear relations can be 
detected and modeled by embodiments of this invention. Methods of this invention are useful for 
analyzing genetic networks and for development of new pharmaceuticals which target particularly 
genes that control genetic expression of important genes. Thus, methods of this invention can 
decrease the time needed to identify drug targets and therefore can decrease the time needed to 
develop new treatments. 

Other aspects of methods of this invention are described in the Examples below. 

EXAMPLES 

The examples presented below represent specific embodiments of this invention. Other 
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aspects of the invention can be developed by persons of ordinary skill in the art without undue 
experimentation. All such embodiments are considered part of this invention. 

Example 1 : Bay esian Network and Nonparametric Regression 



Suppose that we have an n x p microarray gene expression data matrix X, 
where n and p are the numbers of microarrays and genes, respectively. Usually, 
the number of genes p is much larger than the number of microarrays, n. In the 
estimation of a gene network based on the Bayesian network, a gene is considered 
as a random variable. When we model a gene network by using statistical models 
described by the density or probability function, the statistical model should 
include p random variables. However, we have only n samples and n is usually 
much smaller than p. In such case, the inference of the model is quite difficult or 
impossible, because the model has many parameters and the number of samples 
is not enough for estimating the parameters. The Bayesian network model has 
been advocated in such modeling. 

In the context of the dynamic Bayesian network, we consider the time series 
data and the ith column vector Xi of X corresponds to the states of p genes 
at time i. As for the time dependency, we consider the first order Markov re- 
lation described in Figure 1. Under this condition, the joint probability can be 
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decomposed as follows: 

P[Xn, ^ X np ) = P(X x )P(X 2 \Xi) x . - x P{X n \X n -i), 0) 

where Xi = (Jfj,..., Jfi P ) is a random variable vector of p genes at time L The 
conditional probability P(Xi\Xi-i) can also be decomposed into the product 
of conditional probabilities of the form 

P{Xi[X M ) « P(Xn\Pi- hl ) x ... x PC^IP*-,,), (2) 

where P*-i j is the state vector of the parent genes of jth gene at time i - 1. 
The equations (1) and (2) hold when we use the density function instead of 
the probability measure. Hence, the dynamic Bayesian network can then be 
represented by using densities as follows: 

/(*«, Znp) = /l(Xi)/2(»2}xO X * - X /„(x«|x«~i) 

rt 

= /i(sii)I]ji(jflbw,i) x ■ •• x PpC^pbi-i.p) 

= a n I n «? } • 

Here we have the decomposition from (2) 

f 

fi(Xi\Zi-i) = ffl(xu|pi_i,i) X • • • X ^(XiplPi.j^), 

where Pi_ij = (p<_i,i» *••« pH-\,gj) is a ^-dimensional observation vector of par- 
ent genes. 
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For modeling the relationship between Xij and p^_ t j, we use the nonpar a~ 
metric additive regression model as follows: 



where e,j depends independently and normally on mean 0 and variance a?. Here, 
rrijici ) <s a smooth function from R to R and can be expressed by using the linear 
combination of basis functions 



,0) 



where 7^, 7^. fcfc are unknown coefficient parameters and &M* fc *(')} 
is the prescribed set of basis functions. Then we define a dynamic Bayesian net- 
work and nonparametric regression model of the form 



/ten, ...» #nj>; #c) 

=/.(x,)n 



AA n_j =exp /-^-^-». J )>n1, 



where M(p,_i,j) = %(p!-i,i)+- "+%i(Pf-i, 9i )' When i tn S ene nas 130 P arent 
genes, MPi-ij) k resulted in the constant ^. 

We assume /i(sci) = #i(xn)x- ■ -x$\(x\ p ) and the joint density /(«n,..., x np \ 
$c) can then be rewritten as 

p 

/(xn,.".%p;0c) = II 

P n 
j=lt=l 

where p 0j - = 0- Thus, tyfejlPi-i ji fy) represents the local structure of jth gene 
and its parent genes. 



, ,A 1 / fa-/'(p,-i,,))M 
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Example 2: 



Derivation of a Criterion for Selecting a Network 



The dynamic Bayesian network and nonpar arnetric regression model introduced 
in the previous section can be constructed when we fix the network structure 
and estimated by a suitable procedure. However> the gene network is generally 
unknown and we should estimate an optimal network based on the data. This 
problem can be viewed as a statistical model selection problem (see e.g., Akaike 
(Ij; Konishi and Kitagawa [17); Bumham and Anderson [4j; Konishi [16])* We 
solve this problem from the Bayesian statistical approach and derive a criterion 
for evaluating the goodness of the dynamic Bayesian network and nonparametric 
regression model. 



Let ff(0c|A) be a prior distribution on the parameter Bq in the dynamic 
Bayesian network and nonparametric regression model and let log7r(0£|A) = 
0{n). The marginal likelihood can be represented as 



Thus, when the data is given, the posterior probability of the network G is 



where npri 0 r{G) is the prior probability of the network G. The denominator of 
(4) does not relate to model evaluation. Therefore, the evaluation of the net* 
work depends on the magnitude of numerator. Hence, we can choose an optimal 
network as the maximizer of 
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It is clear that the essential point for constructing a network selection criterion 
is how to compute the high dimensional integral. Imoto et at [14, 15] used the 
Laplace approximation for integrals (see also Tinerey and Kadane (21); Davi- 
son [6]) and we can apply this technique to the dynamic Bayesian network 
model and nonparametric regression directly. Hence, we have a criterion, named 
BNRCdynomtc, of the form 



BKRCdynamic(G) 

= -21og j^^G) J f{xi U ->x npi 0GM0G\X)d8G^ 
« -21ogirprt«r(G) - r log(27r/n) + \og\J x {9 G )\ - 2nl x (9 G \X n ), (5) 
where r is the dimension of 8 Gl 

h{0 G \X n ) = log f(x u , x np - e G )/n + log n(8 G \X)/n t 

ue a ) = -d 2 {i x (9 G \x n )}/de G del 

and 8 G is the mode of l x (9 G \X n ), The optimal graph is chosen such that the 
criterion BNRCrf yn amtc (5) is minimal. 



Example 3: Estimation of a Gene Network 

In this section, we show a concrete strategy for estimating a gene network from cDNA microarray 
time series gene expression data. 
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3.1 Nonparametric Regression 

We use the basis function approach for constructing the smooth function rrij^) 
described in Section 2. In this paper we use B-splines (de Boor [7]) as the basis 
functions. De Boor's algorithm (de Boor (7), Chapter 10, p.130 (3)) is a use- 
ful method for computing B-splines of any degree, We use 20 J?-splines with 
equidistance knots (see also, Dierckx (10); Eiler and Marx [11] for the details of 
B-spline). 



3.2 Prior Distribution on the Parameter in the Model 

For the prior distribution on the parameter 6c, suppose that the parameter 
vectors 8j are independent one another, the prior distribution can then be de- 
composed as it(9g\X) = Ylj=i *j(Pj\^s)' Suppose that the prior distribution 
k factorized as nj(9j\Xj) = Kjkdjkl^jk), where Xjk are hy- 
per parameters. We use a singular Mjk variate normal distribution as the prior 
distribution on 7^, 

t n ^ ( 2ir Y (Mik - 2)/ * ,1/2 ( n\ jk T \ 



where Kjk is an Mj k x Mjk symmetric positive semidefinite matrix satisfying 

l&Kjrtj* = IX - + Th« se«ing of the prior AM- 
bution on $g is the same as Imoto et d [14, 15] and the details are in those 
papers. 
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3.3 Proposed Criterion 

By using the prior distributions in Section 4.2, the BNRCrf yn amtc can be decom- 
posed as follows: 

BNRdynomic = ^BNRC^ flmi - c , (6) 
where BNRC2Lj m t C is a local criterion score of jth gene and is defined by 



BNRC^ am£c 

= "2 log | j n^iariLj^giixiAPi^e^iOjlXAdOj 

« ^logn^Lj) - ti log(27r/n) + log - 2n#. ) (0 i |X), 

where Tj is the dimension of Oj , 

n 

and Oj is the mode of l^(6j\X). Here rrprior{Lj) are prior probabilities satis- 
fying YL P jz*\}°Z 1T prior{Lj) — log 7r prior (G). We set the prior probability of local 
structure npri 0r (Lj) as nprior(Lj) = exp{-(The number of parent genes of j th 
gene)}- 



By using the dynamic Bayesian network and nonparametric regression model together with 
the proposed criterion, BNRCdl/namic, we can formulate the network learning process as follows: 
it is clear from (3) and (6) that the optimization of network structure is equivalent to the choices of 
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the parent genes that regulate the target genes. However, it is a time-consuming task when we 
consider all possible gene combinations as the parent genes. Therefore, we cut down the learning 
space by selecting candidate parent genes. After this step, a greedy hill climbing algorithm is 
employed for finding better networks. Our algorithm can be expressed as follows: 

Step 1: Preprocessing stage 

We make the/? x p matrix whose (i, j)th element is the BNRC score of the graph 
"gene/ gene," and we define the candidate set of parent genes of gene, that gives small BNRC 
score. We set the number of elements of the candidate set of parent genes 10. 

Step 2: Learning stage 

For a greedy hill-climbing algorithm, we start form the empty network and repeat the 
following steps: 

Step2-1 : For gene, , implement one from two procedures that add a parent gene, delete a parent 
gene, which gives smaller BNRC^ am/c score. 

Step2-2: Repeat Step2- 1 for prescribed computational order of genes until suitable convergence 
criterion is satisfied. 

Step2-9: Permute the computational order for finding better solution and repeat Step2- 1 and 2-2. 
Step2-4: We choose the optimal network that gives the smallest BNRC rf>/fl „ OTiC score. 

Example 4: Computational Experiment 

We demonstrated one embodiment of this invention through the analysis of the 
Saccharomyces cerevisiae cell cycle gene expression data collected by Spellman et al [20] . This 
data contains two short time series (two time points; cln3, clb2) and four medium time series (18, 24, 
1 7 and 14 time points; alpha, cdcl 5, cdc28 and elu). In the estimation of a gene network, we used 
four medium time series. For combining four time series, we ignored the first observation of the target 
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gene and last one of parent genes for each time series when we fit the nonparametric regression 
model. 

At first, we focused on the cell cycle pathway compiled in KEGG database [22] . The target 
network is around CDC28 (YBR1 60w; cyclin-dependent protein kinase). This network contains 
45 genes and the pathway registered in KEGG is shown in Figure 2 (a) and the estimated network 
is in Figure 2 (b). The edges in the dotted circles can be considered the correct edges. We thus 
modeled some correct relations. We denoted the correct estimation by the circle next to edge. The 
triangle represents the reverse or skip of correct direction. The "x" symbols represent incorrect 
relationships.. 

A second example used to demonstrate our methods is the metabolic pathway reported by 
DeRisi et al [9]. This network contains 57 genes and the target pathway is shown in Figure 3(a). 

We applied a Bayesian network and nonparametric regression model [14,15] to this data 
and the resulting network is depicted in Figure 3 (b). The network of Figure 3 (c) was obtained by 
the dynamic Bayesian network and nonparametric regression model. It is difficult to estimate the 
metabolic pathway from cDNA microarray data. However, our model detected correct relationships 
between the genes. Compared with the Bayesian network and nonparametric regression, the number 
of false positives of this method depicted in Figure 3 (c) was much smaller than those depicted in 
Figure 3 (b) by the "x" symbols. 

All references cited herein are incorporated herein in their entirety. 
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